Practice of the new supervised machine learning predictive analytics for glioma patient survival after tumor resection: Experiences in a high-volume Chinese center

Objective This study aims to assess the effectiveness of the Gradient Boosting (GB) algorithm on glioma prognosis prediction and to explore new predictive models for glioma patient survival after tumor resection. Methods A cohort of 776 glioma cases (WHO grades II–IV) between 2010 and 2017 was obtained. Clinical characteristics and biomarker information were reviewed. Subsequently, we constructed the conventional Cox survival model and three different supervised machine learning models, including support vector machine (SVM), random survival forest (RSF), Tree GB, and Component GB. Then, the model performance was compared with each other. At last, we also assessed the feature importance of models. Results The concordance indexes of the conventional survival model, SVM, RSF, Tree GB, and Component GB were 0.755, 0.787, 0.830, 0.837, and 0.840, respectively. All areas under the cumulative receiver operating characteristic curve of both GB models were above 0.800 at different survival times. Their calibration curves showed good calibration of survival prediction. Meanwhile, the analysis of feature importance revealed Karnofsky performance status, age, tumor subtype, extent of resection, and so on as crucial predictive factors. Conclusion Gradient Boosting models performed better in predicting glioma patient survival after tumor resection than other models.


Introduction
Glioma is the most widely recognized primary tumor in the central nervous system (CNS) (1). Accounting for around 80% of malignant CNS tumors (1), gliomas are composed of lowergrade gliomas [LGGs; World Health Organization (WHO) grades II and III] and grade IV gliomas (glioblastoma, GBM). The treatment of glioma is troublesome, and tumor resection is the main approach to treatment. Due to the large heterogeneity between different kinds of gliomas, the prognosis of glioma patients is diverse, and the survival always ranges from a few months to 10 years (2,3). Obviously, GBM was supposed to have a poorer prognosis than diffuse low-grade and intermediate-grade gliomas for its characteristics of invading growth and easy recurrence. However, along with the presence of certain molecular markers and various clinical characteristics, including age, Karnofsky performance status (KPS), symptoms, and so on, the prognosis varies even in this most malignant type of glioma, GBM. Predicting glioma patient survival after tumor resection still remains a great challenge for clinical doctors.
Nowadays, there have been endeavors, mainly in three directions, to explore useful predictive models for glioma prognosis. Some researchers have focused on traditional multivariate Cox regression models with several certain prognostic factors. For example, Gittleman et al. (4) developed a survival nomogram for LGGs with independent validation. Meanwhile, some turn to new biomarkers for the construction of models. Not long ago, Zhang et al. (5) constructed a novel model using immune-related gene signature, which is also effective in predicting overall survival in primary LGG. What is more, some researchers have concentrated on radiomics feature prediction models and made some achievements (6). Albeit the effort in putting forward these models, some shortcomings limit their usefulness and availability of these models. First, the traditional statistic approach has a huge limitation: its analysis is based on the condition of a linear relationship and might miss the nonlinear relationship between input and outcome. In other words, this approach cannot fully use medical information, which makes it unable to adjust to the era of big data. Second, as Jakola et al. (7) claimed, a pure biomarker approach for prediction, such as gene signature model, is of limited value because tumor classes and tumor cells are neither stable over time nor homogeneous throughout the lesion tissue. Third, prediction models based on radiomics features are powerful and promising, but we acknowledge that the techniques are at an early stage and available only at a limited number of centers and not readily validated in medical practice yet (7). Therefore, it is still necessary to explore a new predictive model based on the algorithm suited to the big-data era, with the combination of common clinical features and reliable biomarkers as prognostic factors.
Recently, supervised machine learning (ML) methods have demonstrated precise predictive capacity, being progressively utilized in the prognosis prediction of different diseases (8). The supervised ML approach is a kind of data-driven analysis method, including support vector machine (SVM) (9), decision tree (10), and so on, which integrates multiple risk factors into a predictive algorithm and performs well with complex information (11). Gradient Boosting (GB) is one of the supervised ML algorithms. Although it was strange for medical workers, this ML algorithm did have a good performance in medical scenes, such as predicting the survival outcome of triple-negative breast cancer (12) and the recurrence of colorectal cancer (13). So far, studies seldom used Gradient Boosting to analyze and predict glioma prognosis. This study was conducted to assess its effectiveness on glioma prognosis prediction and to explore new predictive models for glioma patient survival after tumor resection.

Patients
Approved by the Institutional Review Board of Sun Yat-sen University, this study was conducted in the Neurosurgery unit, the First Affiliated Hospital, Sun Yat-sen University, a high-volume central center that performs approximately 100 glioma surgeries yearly. In accordance with the guidelines for retrospective study in our institution, the institutional review board waived the requirement for patients' informed consent. Our study only included cases of astrocytoma, oligodendroglioma and oligoastrocytoma, anaplastic astrocytoma, oligodendroglioma and oligoastrocytoma, and glioblastoma. A cohort of 776 glioma cases (WHO grades II-IV) between 2010 and 2017 was obtained. This consecutive malignant series consisted of 74 cases of WHO grade III (anaplastic astrocytoma, oligodendroglioma, and oligoastrocytoma), 268 cases of WHO grade IV (glioblastoma), and 434 cases of WHO grade II (astrocytoma, oligodendroglioma, and oligoastrocytoma).

Clinical characteristics
Most data were accessible through the hospital database. All data were extracted into two copies of a standardized form by two research assistants independently and integrated into the final file version by a third. Discrepancies were discussed and resolved by consensus. The extracted characteristics include age at surgery, gender, symptoms (seizures, headaches/dizziness, nausea/vomiting, limb dysfunction, blurred vision, or other cranial nerve deficit), duration of the first presenting symptom, preoperative KPS, tumor size and location, time of surgery, extent of resection (gross-total resection and others), tumor subtype, treatment after surgery (chemotherapy and/or radiotherapy), survival status (alive or dead), and survival/follow-up time. The subtype of glioma was reviewed by a pathologist according to the latest 2016 WHO criteria (14). The deficit of motor, visual, or cranial never function was confirmed by the proof of physical examination, diffusion tensor imaging (DTI)-based tractography, and so on. The same as the definition by Okamoto et al. (15), the extent of resection was categorized, where gross-total resection was defined as residual tumor less than 5%. The follow-up data were collected until December 2019. Survival/follow-up time was calculated from the date of tumor resection to death (any cause) or censor (still survived) in December 2019. All patients were followed up at the regular interval of 3 months for the initial 3 years and afterward followed every 1 year until death. The last follow-up for every single accessible patient was finished in December 2019.

Biomarkers
Biomarkers' detection, including immunohistochemistry (IHC) and molecular genetics, was performed on histological specimens that were obtained at the time of resection surgery prior to chemotherapy and/or radiotherapy treatment. The detection of kit67, p53, vimentin, and glial fibrillary acidic protein (GFAP) was performed using immunohistochemical stains in glioma by standard techniques that were described previously (16). For the specimen with p53 immunohistochemical stain, the presence of strong positive tumor nuclei in more than 10% of cells was marked as immunopositive, which indicated the mutational status of TP53 (17). The immunopositivity of vimentin was identified when more than 25% of tumor nuclei stained positive with vimentin IHC stain. GFAP immunopositivity was marked when any tumor nuclei were positive with GFAP IHC stain. The Ki-67 index was recorded as the average percentage of the positive ones on the total number of nuclei at 400× magnification, where "≥10%" represented high Ki-67 expression (18). As for molecular genetics, the biomarker we detected was methylation of the O 6 -methylgaunine-DNA-methyltransferase (MGMT) promoter. This test was done using methylation-specific PCR.
Supervised machine learning algorithm SVM, as a machine learning algorithm, has been widely used in the prognosis of diseases. Decision tree is a well-known ML approach for statistical problems, which represents the mapping relationship between properties and values. It consists of a root node, internal nodes, and leaf nodes, where leaf nodes correspond to values represented by the path from the root node to the leaf node. Decision tree can be used for survival analysis (19). Here, we used survival decision tree as the base learner of random forests (RFs). RF is an ensemble tree method whose final prediction is the average of all predictions from every tree in the forest. RF performs better in prediction than a single tree because a combination of predictions from separate methods could substantially promote prediction performance (20). Random survival forest (RSF) is an adaptation of random forest, which is designed for the analysis of survival data (21).
GB is an ML technique that can be used for survival analysis. Here, we used component-wise least squares and survival decision tree as two types of base learners, respectively. The Gradient Boosting algorithm produces different weak prediction models (for instance, component-wise least squares) at each step and combines them into a total model at different weights. The prediction of the weak model that Gradient Boosting produced at each step generates a unanimous gradient direction of the loss function. The details have been described previously (22).

Model evaluation
Harrell's concordance index (c-index), defined as the ratio of correctly ordered (concordant) pairs to comparable pairs, is a measure of the rank correlation between predicted risk scores and observed time points. A value of 1 refers to perfect prediction, while a value of 0.5 means that prediction does not perform better than random guessing.
The area under the receiver operating characteristic curve (ROC curve) is often used to assess the discrimination of the binary classification model. When extending the ROC curve to survival time, it gives rise to the time-dependent cumulative ROC curve at a certain survival time t. The area under the cumulative ROC curve (AUC) at time t indicates how well a model can distinguish subjects who will experience an event by time t from those who will not.
The calibration curve is a graphical measure of the calibration of the model, which is a linear plot with the predicted event on the x-axis and the observed event on the y-axis. Good calibration would be matched by a regression line with a 45°slope.
To fully capture the true utility of a prediction model, the sensitivity and specificity of models for predicting 6-, 12-, 36-, and 60-month survival were calculated after determining the optimal threshold through the ROC curve.

Model construction
All clinical characteristics and biomarker information were included in the model training set as variables. The missing values of variables were filled with multiple imputations. Here, we randomly split the data into a training set and a test set at an 8:2 ratio using the train_test_split function in the scikit-learning module of Python (version 3.7). The scikit-survival module (version 0.12.1) was used to construct ML models, SVM, RF, Tree GB, and Component GB. ML algorithms involve many hyperparameters that are significant for performance prediction. The optimal combination of hyperparameters was determined using the method of grid search. During every cross-validation, 1/3 of the data in the training set were randomly excluded as out-ofbag (OOB) data for validation. For different combinations of hyperparameters, the mean c-index on the validation data was calculated after 50 times cross-validation. The hyperparameter combination with the best c-index was selected as optimum. After constructing the model, we usually assess the feature importance by calculating its contribution to the c-index, namely, the decrease of c-index after discombobulating the relationship of this feature with survival.
To compare the performance difference between ML models and conventional survival models, we also built the Cox proportional hazards model. Three continuous variables, age at surgery, preoperative KPS, and tumor size, were transformed into categorical variables to obtain the best model prediction performance. Cutoffs for these variables were 50 years, 70 cm, and 55 cm, respectively. All variables were entered into the model step by step, and the final model only included variables with a significant risk ratio.

Statistical analysis
Mean ± SD or median (IQR) was chosen to describe continuous variables regarding their statistical distribution, while categorical variables were expressed in the form of example numbers (%). P < 0.05 was set as the criteria of statistical significance in all analyses. The confidence interval (CI) of the AUC was computed by the bootstrap method, while 95% CI was computed with 2,000 stratified bootstrap replicates. The comparisons of the c-index between different models were conducted using the R package Survcomp.

Characteristic overview
The sociodemographic and characteristics of the study population are presented in Table 1. The most frequent symptom Li et al. 10.3389/fsurg.2022.975022 Frontiers in Surgery was headaches or dizziness, while the most frequent tumor location and subtype were parietal lobe and diffuse astrocytoma, respectively. The medians of the duration of the first presenting symptom, preoperative KPS, and tumor size were 1.90 months, 60 mm, and 45.00 mm, respectively. Gross-total resection was adopted in 90.0% of patients. The immunopositivity of GFAP, Vimentin, and p53 was observed in more than half of the patients. The median of survival time was 32.65 months. Figure 1 shows the correlation coefficient between each independent variable. It demonstrated low correlation between each variable.

Model performance
The flow chart of model construction is shown in Supplementary Figure S1. Supplementary Table S1 shows the detailed descriptions of the selected modules, classes, and hyperparameters in Python for each model, including the Cox survival model and supervised ML models.
The five models are compared in Table 2. The Cox proportional hazards model had the worst performance, with a concordance index of 0.755 for the test set. The SVM model was observed to have relatively poor performance, with a c-index of 0.787 for the test set. The Tree GB survival model ranked second, with a c-index of 0.837, while the RSF model ranked third, with a c-index of 0.830. The Component GB survival model had the best prediction performance, with a c-index of 0.840. In addition, we also compared the c-index values of different models on the test set.  The c-index values of Tree GB and Component GB survival models were significantly higher than those of the Cox proportional hazards model (P < 0.05) and SVM model (P < 0.05). Although the comparison results of the RF model were not significant (P values were 0.332 and 0.112, respectively), relatively superior performances of Tree GB and Component GB models were still observed. The reason for no statistical significance could be attributed to the little sample size of the test set to a certain extent. Figure 2 shows the AUCs of both GB models. All AUCs at different survival times were above 0.800, which indicated the excellent discrimination of models. The prediction AUC and CI values of both GB models' for 6-, 12-, 36-, and 60-month survival are specifically listed in Supplementary Tables S2 and S3, which highlighted superior predictive performance. The calibration curves of both GB models are shown in Supplementary Figures S2 and S3, where good calibration was found in survival prediction. Based on the optimal thresholds, the Tree GB model predicted 6-, 12-, 36-, and 60-month survival with 94.4%, 90.6%, 99.3%, and 100% sensitivity and 91.3%, 85.7%, 71.3%, and 73.8% specificity, while the Component GB model predicted the survival results with 90.0%, 73.5%, 85.2%, and 90.4% sensitivity and 84.1%, 92.9%, 87.5%, and 82.8% specificity, respectively. The results are listed in Supplementary Tables S4 and S5.

Feature importance
From Table 3, we found that KPS, the tumor subtype of glioblastoma not otherwise specified (NOS), age, tumor size, and the tumor subtype of oligodendroglioma (NOS) ranked top five in the Tree GB survival model in terms of the feature importance. As for the Component GB survival model, it was KPS, the tumor subtype of glioblastoma (NOS), age, extent of resection, and tumor size that ranked the top five. As described in Supplementary Figure S4, significant variables included in the final Cox proportional hazards model were KPS, age, tumor size, tumor subtype, extent of resection, chemotherapy, radiotherapy, p53 immunopositivity, and methylation of the MGMT promoter.

Sensitivity analysis
We also performed sensitivity analysis to detect the robustness of ML model prediction performance. Training and testing with variables without imputation, the c-indexes of ML survival models are listed in Table 4. All c-indexes were at a high level (above Correlation coefficient matrix of each variable. Each coefficient is annotated. The closer it gets to 1, the more positively correlated it is. The closer it gets to −1, the more negatively correlated it is.

Discussion
This study was designed to assess the effectiveness of the supervised ML algorithm, especially Gradient Boosting, in the prediction of glioma patient survival after tumor resection and to explore new predictive models useful for medical workers. Judging by Harrell's concordance index of the training set and test set, the Gradient Boosting algorithm ranked first on prediction performance. There were differences in prediction performance between Tree GB and Component GB algorithms. Tree GB showed better performance on the training set (c-index: 0.905) but worst performance on the test set (c-index: 0.837) than Component GB, which implied a trend of overfitting. By the way, considering the discrimination and calibration through time-dependent cumulative ROC curves and sensitivity, specificity, and calibration curves, Tree and Component GB were both good at predicting 6-, 12-, 36-, and 60-month survival after surgery. The results of the sensitive analysis revealed that both GB models were stable at the prediction outcome. At the same time, the feature importance of ML models was also assessed. The top 15 important features in Tree or Component GB models could be reduced to nine variables, namely, KPS, age, tumor size, tumor subtype, extent of resection, chemotherapy, radiotherapy, p53 immunopositivity, and methylation of the MGMT promoter. It was in line with the significant variables included in the Cox proportional hazards model.
Consistent with the results of previous studies (23,24), our result revealed that KPS influenced glioma patient survival after resection surgery. KPS or similar crude scales are commonly seen methods to evaluate gross functional status and have been repeatedly described as prognostic factors in the management of glioma patients (23,24). Also, age is one of the most established prognostic factors in patients with malignant gliomas, regardless of lower-grade (24,25) or higher-grade gliomas (26). As claimed by Paugh et al. (27), the substantial differences in the molecular features underlying age-stratified gliomas might lead to different treatment responses, accounting for different survival outcomes. A cutoff value of 55 years has been reported repeatedly to stratify glioma patients, while significantly impaired survival is always observed in those 55 years and above. Here, our study confirmed advanced age as an unfavorable prognostic factor once more. Previous studies (28,29) have shown a strong association between preoperative tumor size and glioma survival, which is in line with the finding of our research. Regarding the extent of resection, complete curative resection is thought impossible due to the lack of clear tumor borders and the invasive behavior of the tumor. Although a number of studies (30, 31) have demonstrated that maximal resection substantially improves progression-free and overall survival, it has also been reported that aggressive glioma resection might increase the risk of postoperative complications and lead to worse survival prognosis. Therefore, the relationship between the extent of surgical resection and patient outcome still remains controversial. Even so, our results showed a positive correlation between gross-total resection and prognosis improvement for patients compared to partial resection or biopsy. In view of chemotherapy and radiotherapy, they are crucial elements in the treatment plan of glioma patients. Postoperative adjuvant radiotherapy and chemotherapy have always been recommended to start within 2-4 weeks after surgical resection and have proven to be significant prognostic factors by previous studies (32, 33) and this study. Tumor subtype is one the most commonly recognized prognostic factors, and the subtype based on the latest 2016 WHO criteria helps to predict patient prognosis more accurately. Here, our research also served as evidence of the critical role of tumor subtype in glioma management.